{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src=\"../../img/ods_stickers.jpg\" />"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "## 心血管疾病数据探索分析"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "本次挑战中，您将回答有关心血管疾病（CVD）数据集的问题，并预测患者是否存在心血管疾病（CVD）风险。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 初步数据分析"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "首先，导入挑战所需模块："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import warnings\n",
    "\n",
    "import matplotlib\n",
    "import matplotlib.pyplot as plt\n",
    "import matplotlib.ticker\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "import seaborn as sns\n",
    "from matplotlib import rcParams\n",
    "\n",
    "warnings.filterwarnings(\"ignore\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "本次挑战将优先选择 Seaborn 用于绘图，下面我们定义 Seaborn 全局绘图参数，保证后续图像更加整洁美观。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sns.set()\n",
    "sns.set_context(\n",
    "    \"notebook\", font_scale=1.5, rc={\"figure.figsize\": (11, 8), \"axes.titlesize\": 18}\n",
    ")\n",
    "\n",
    "rcParams[\"figure.figsize\"] = 11, 8"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "接下来，读取并预览数据集。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df = pd.read_csv(\"../../data/mlbootcamp5_train.csv\", sep=\";\")\n",
    "print(\"Dataset size: \", df.shape)\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "数据集特征表示如下："
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "| Feature | Variable Type | Variable      | Value Type |\n",
    "|---------|--------------|---------------|------------|\n",
    "| Age | Objective Feature | age | int (days) |\n",
    "| Height | Objective Feature | height | int (cm) |\n",
    "| Weight | Objective Feature | weight | float (kg) |\n",
    "| Gender | Objective Feature | gender | categorical code |\n",
    "| Systolic blood pressure | Examination Feature | ap_hi | int |\n",
    "| Diastolic blood pressure | Examination Feature | ap_lo | int |\n",
    "| Cholesterol | Examination Feature | cholesterol | 1: normal, 2: above normal, 3: well above normal |\n",
    "| Glucose | Examination Feature | gluc | 1: normal, 2: above normal, 3: well above normal |\n",
    "| Smoking | Subjective Feature | smoke | binary |\n",
    "| Alcohol intake | Subjective Feature | alco | binary |\n",
    "| Physical activity | Subjective Feature | active | binary |\n",
    "| Presence or absence of cardiovascular disease | Target Variable | cardio | binary |"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "所以数据均为医学检查时收集，主要包含 3 种类别的特征："
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- Objective Feature: 基础事实数据。\n",
    "- Examination Feature: 体检结果。\n",
    "- Subjective Feature: 患者给出的主观信息。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "接下来，我们探索数据值分布情况。这里使用 [`catplot()`](https://seaborn.pydata.org/generated/seaborn.catplot.html) 绘制出变量特征的计数条形图。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df_uniques = pd.melt(\n",
    "    frame=df,\n",
    "    value_vars=[\"gender\", \"cholesterol\", \"gluc\", \"smoke\", \"alco\", \"active\", \"cardio\"],\n",
    ")\n",
    "df_uniques = (\n",
    "    pd.DataFrame(df_uniques.groupby([\"variable\", \"value\"])[\"value\"].count())\n",
    "    .sort_index(level=[0, 1])\n",
    "    .rename(columns={\"value\": \"count\"})\n",
    "    .reset_index()\n",
    ")\n",
    "\n",
    "sns.catplot(\n",
    "    x=\"variable\", y=\"count\", hue=\"value\", data=df_uniques, kind=\"bar\", height=12\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "接下来，让我们按目标值分割数据集，这样往往可以通过绘图结果快速找出相对重要的特征。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df_uniques = pd.melt(\n",
    "    frame=df,\n",
    "    value_vars=[\"gender\", \"cholesterol\", \"gluc\", \"smoke\", \"alco\", \"active\"],\n",
    "    id_vars=[\"cardio\"],\n",
    ")\n",
    "df_uniques = (\n",
    "    pd.DataFrame(df_uniques.groupby([\"variable\", \"value\", \"cardio\"])[\"value\"].count())\n",
    "    .sort_index(level=[0, 1])\n",
    "    .rename(columns={\"value\": \"count\"})\n",
    "    .reset_index()\n",
    ")\n",
    "\n",
    "sns.catplot(\n",
    "    x=\"variable\",\n",
    "    y=\"count\",\n",
    "    hue=\"value\",\n",
    "    col=\"cardio\",\n",
    "    data=df_uniques,\n",
    "    kind=\"bar\",\n",
    "    height=9,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "您可以看到胆固醇和葡萄糖水平对目标变量影响明显较大。这是巧合吗？"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "接下来，你需要自行补充必要的代码来回答相应的挑战问题。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 进一步观察"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<i class=\"fa fa-question-circle\" aria-hidden=\"true\"> 问题：</i>数据集中有多少男性和女性？由于 `gender` 特征没有说明男女，你需要通过分析身高计算得出。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- [ A ] 45530 女性 和 24470 男性\n",
    "- [ B ] 45530 男性 和 24470 女性\n",
    "- [ C ] 45470 女性 和 24530 男性\n",
    "- [ D ] 45470 男性 和 24530 女性"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<i class=\"fa fa-question-circle\" aria-hidden=\"true\"> 问题：</i>数据集中男性和女性，哪个群体饮酒的频次更高？"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- [ A ] 女性\n",
    "- [ B ] 男性"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<i class=\"fa fa-question-circle\" aria-hidden=\"true\"> 问题：</i>数据集中男性和女性吸烟者所占百分比的差值是多少？"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- [ A ] 4\n",
    "- [ B ] 16\n",
    "- [ C ] 20\n",
    "- [ D ] 24"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<i class=\"fa fa-question-circle\" aria-hidden=\"true\"> 问题：</i>数据集中吸烟者和非吸烟者的年龄中位数之间的差值（以月计）近似是多少？你需要尝试确定出数据集中 `age` 合理的表示单位。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- [ A ] 5\n",
    "- [ B ] 10\n",
    "- [ C ] 15\n",
    "- [ D ] 20"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "本次挑战规定 1 年为 365.25 天。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 风险量表图"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "欧洲心脏病学会的网站上给出了 [<i class=\"fa fa-external-link-square\" aria-hidden=\"true\"> SCORE scale</i>](https://www.escardio.org/Education/Practice-Tools/CVD-prevention-toolbox/SCORE-Risk-Charts) 量表。它可以用于计算未来 10 年心血管疾病死亡的风险。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src='https://doc.shiyanlou.com/courses/uid214893-20190505-1557034697564' width=50%>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "让我们来看看最右上角的矩形，也就是 60 到 65 岁的吸烟男性的子集。其中，矩形的左下角看到一个值 9，在右上角看到 47。量表意味着对于收缩压（纵坐标）低于 120 （正常血压）的性别年龄组的人来说，心血管疾病的风险估计比收缩压为 $[160, 180)$ 的患者（高血压）低 5 倍（47/9）。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "接下来，让我们结合量表，并利用挑战数据集进行计算。这里需要注意的是，量表中胆固醇（Cholesterol）水平和挑战数据单位不太一样，我们使用对应关系为：`4 mmol/l` $\\rightarrow$ `1`, `5-7 mmol/l` $\\rightarrow$ `2`, `8 mmol/l` $\\rightarrow$ `3`。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<i class=\"fa fa-question-circle\" aria-hidden=\"true\"> 问题：</i>计算 $[60, 65)$ 年龄区间下，较健康人群（胆固醇类别 1，收缩压低于 120）与高风险人群（胆固醇类别为 3，收缩压 $[160, 180)$）各自心血管病患所占比例。并最终求得二者比例的近似倍数。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- [ A ] 1\n",
    "- [ B ] 2\n",
    "- [ C ] 3\n",
    "- [ D ] 4"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "选择 $[60, 65)$ 年龄区间："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "smoking_old_men = df[\n",
    "    (df[\"gender\"] == 2)\n",
    "    & (df[\"age_years\"] >= 60)\n",
    "    & (df[\"age_years\"] < 65)\n",
    "    & (df[\"smoke\"] == 1)\n",
    "]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "smoking_old_men[\n",
    "    (smoking_old_men[\"cholesterol\"] == 1) & (smoking_old_men[\"ap_hi\"] < 120)\n",
    "][\"cardio\"].mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "胆固醇类别为 1，收缩压低于 120，心血管疾病患者的比例为 26%。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "smoking_old_men[\n",
    "    (smoking_old_men[\"cholesterol\"] == 3)\n",
    "    & (smoking_old_men[\"ap_hi\"] >= 160)\n",
    "    & (smoking_old_men[\"ap_hi\"] < 180)\n",
    "][\"cardio\"].mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "胆固醇类别为 3，收缩压 $[160, 180)$，心血管疾病患者的比例为 86%。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "所以，挑战数据给出的比例大约是 $\\frac{0.86}{0.26} \\approx 3$ 倍，而量表给出的是 5 倍。当然，这依赖与给定年龄组中的病人比例。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### BMI 指数分析"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "挑战将需要创建一个新的特征 BMI，BMI 为身高体重指数，其可以反映体重的标准情况，计算公式为："
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$BMI = \\frac{weight (kg)}{height^2 (m)}$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "正常 BMI 指数一般在 18.5 到 25 之间。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<i class=\"fa fa-question-circle\" aria-hidden=\"true\"> 问题：</i>请选择下面叙述正确的有："
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- [ A ] 数据集样本中 BMI 中位数在正常范围内。\n",
    "- [ B ] 女性的平均 BMI 指数高于男性。\n",
    "- [ C ] 健康人群的 BMI 平均高于患病人群。\n",
    "- [ D ] 健康和不饮酒男性中，BMI 比健康不饮酒女性更接近正常值。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 数据清洗"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "你可能会注意到给出的数据并不够完美，在进一步可视化之前，我们需要对数据进行清洗。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<i class=\"fa fa-question-circle\" aria-hidden=\"true\"> 问题：</i>请按照以下列举的项目，过滤掉数据中统计有误的部分："
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- 血压特征中，舒张压高于收缩压的样本。\n",
    "- 身高特征中，低于 2.5％ 分位数的样本。\n",
    "- 身高特征中，高于 97.5％ 分位数的样本。\n",
    "- 体重特征中，低于 2.5％ 分位数的样本。\n",
    "- 体重特征中，高于 97.5％ 分位数的样本。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "百分位数请使用 `pd.Series.quantile` 方法进行确定，不熟悉可以阅读 [<i class=\"fa fa-external-link-square\" aria-hidden=\"true\"> 官方文档</i>](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.quantile.html)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<i class=\"fa fa-question-circle\" aria-hidden=\"true\"> 问题：</i>清洗掉的数据占原数据总量的近似百分比？"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- [ A ] 8%\n",
    "- [ B ] 9%\n",
    "- [ C ] 10%\n",
    "- [ D ] 11%"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 数据可视化分析"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "要更好地理解数据集特征，接下来使用过滤之后的数据创建特征之间相关系数的矩阵。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<i class=\"fa fa-question-circle\" aria-hidden=\"true\"> 问题：</i>使用 [`heatmap()`](http://seaborn.pydata.org/generated/seaborn.heatmap.html) 绘制特征之间的皮尔逊相关性系数矩阵。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<i class=\"fa fa-question-circle\" aria-hidden=\"true\"> 问题：</i>以下哪组特征与性别的相关性更强？"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- [ A ] Cardio, Cholesterol\n",
    "- [ B ] Height, Smoke\n",
    "- [ C ] Smoke, Alco\n",
    "- [ D ] Height, Weight"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 男女身高分布"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "前面的探索中，我们知道性别对应 1 和 2，虽然不知道不同性别对应哪个值，但可以通过平均身高和体重来确定。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<i class=\"fa fa-question-circle\" aria-hidden=\"true\"> 问题：</i>绘制身高和性别之间的小提琴图 [`violinplot()`](https://seaborn.pydata.org/generated/seaborn.violinplot.html)。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "这里建议通过 `hue` 参数按性别划分，并通过 `scale` 参数来计算性别对应的具体数量。为了便于你能正确绘制，这里给出一个 [<i class=\"fa fa-external-link-square\" aria-hidden=\"true\"> 参考示例</i>](https://stackoverflow.com/questions/41573283/seaborn-violin-plot-with-one-data-per-column/41575149#41575149)。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<i class=\"fa fa-question-circle\" aria-hidden=\"true\"> 问题：</i>绘制身高和性别之间的核密度图 [`kdeplot`](https://seaborn.pydata.org/generated/seaborn.kdeplot.html)。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "通过核密度图可以更清楚地看到性别之间的差异，但却无法得到每个性别对应的具体人数。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "大多数情况下，皮尔逊相关性指数可以看出特征之间的相关程度。不过，这里我们进一步绘制 [<i class=\"fa fa-external-link-square\" aria-hidden=\"true\"> Spearman's rank correlation coefficient</i>](https://en.wikipedia.org/wiki/Spearman%27s_rank_correlation_coefficient) 斯皮尔曼等级相关系数对应的图像。它利用单调方程评价两个统计变量的相关性，是用于衡量两个变量的依赖性的非参数指标。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<i class=\"fa fa-question-circle\" aria-hidden=\"true\"> 问题：</i>使用 [`heatmap()`](http://seaborn.pydata.org/generated/seaborn.heatmap.html) 绘制特征之间的斯皮尔曼等级相关系数矩阵。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<i class=\"fa fa-question-circle\" aria-hidden=\"true\"> 问题：</i>下列那一组特征具有最强的 Spearman 相关性？"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- [ A ] Height, Weight\n",
    "- [ B ] Age, Weight\n",
    "- [ C ] Cholesterol, Gluc\n",
    "- [ D ] Cardio, Cholesterol\n",
    "- [ E ] Ap_hi, Ap_lo\n",
    "- [ F ] Smoke, Alco"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 年龄可视化"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "上面，我们已经计算过受访者的年龄。接下来，我们对其进行可视化。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<i class=\"fa fa-question-circle\" aria-hidden=\"true\"> 问题：</i>请使用 [`countplot()`](http://seaborn.pydata.org/generated/seaborn.countplot.html) 绘制年龄分布计数图，横坐标为年龄，纵坐标为对应的人群数量。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<i class=\"fa fa-question-circle\" aria-hidden=\"true\"> 问题：</i>在哪个年龄下，心血管疾病患者人数首次超过无心血管疾病患者人数？"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- [ A ] 44\n",
    "- [ B ] 55\n",
    "- [ C ] 64\n",
    "- [ D ]  70"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div style=\"background-color: #e6e6e6; margin-bottom: 10px; padding: 1%; border: 1px solid #ccc; border-radius: 6px;text-align: center;\"><a href=\"https://nbviewer.jupyter.org/github/shiyanlou/mlcourse-answers/tree/master/\" title=\"挑战参考答案\"><i class=\"fa fa-file-code-o\" aria-hidden=\"true\"> 查看挑战参考答案</i></a></div>"
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
